Parallel Sampling of HDPs using Sub-Cluster Splits

نویسندگان

  • Jason Chang
  • John W. Fisher
چکیده

We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and that previous sampling methods converge slower than were previously expected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplemental Material for Parallel Sampling of HDPs using Sub-Cluster Splits

In the following supplemental material we provide some additional details and derivations for the paper. We begin by showing how to calculate the joint distribution of p(β, z), marginalizing out π, in Section 1. Then, in Section 2 we consider looking at joint log-likelihoods of HDP topic models and show that the typical set of the distribution is very far from the mode. In Sections 3-4, we give...

متن کامل

Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed s...

متن کامل

Parallel Sampling of DP Mixture Models using Sub-Clusters Splits

We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed s...

متن کامل

Collapsed Gibbs Sampling for Latent Dirichlet Allocation on Spark

In this paper we implement a collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model on Spark. Spark is a fast in-memory cluster computing framework for large-scale data processing, which has been the talk of the Big Data town for a while. It is suitable for iterative and interactive algorithm. Our approach splits the dataset into P ∗ P partitions, shuffles a...

متن کامل

Supplemental Material for Parallel Sampling of DP Mixture Models using Sub-Clusters Splits

In this section, we show the derivation of the posterior distribution over cluster-weights, π, conditioned on the cluster labels, z. We begin with the definition of a Dirichlet process from [1]. Definition A.1 (Dirichlet Process). Let H be a measure on a measureable space, Ω. If for any finite partition, (A1, A2, · · · , AK) of the space, the measure, G, on the partition follows the following D...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014